Hybrid sequence-based analysis reveals the distribution of bacterial species and genes in the oral microbiome at a high resolution

Bacteria in the oral microbiome are poorly identified owing to the lack of established culture methods for them. Thus, this study aimed to use culture-free analysis techniques, including bacterial single-cell genome sequencing, to identify bacterial species and investigate gene distribution in saliva. Saliva samples from the same individual were classified as inactivated or viable and then analyzed using 16S rRNA sequencing, metagenomic shotgun sequencing, and bacterial single-cell sequencing. The results of 16S rRNA sequencing revealed similar microbiota structures in both samples, with Streptococcus being the predominant genus. Metagenomic shotgun sequencing showed that approximately 80 % of the DNA in the samples was of non-bacterial origin, whereas single-cell sequencing showed an average contamination rate of 10.4 % per genome. Single-cell sequencing also yielded genome sequences for 43 out of 48 wells for the inactivated samples and 45 out of 48 wells for the viable samples. With respect to resistance genes, four out of 88 isolates carried cfxA, which encodes a β-lactamase, and four isolates carried erythromycin resistance genes. Tetracycline resistance genes were found in nine bacteria. Metagenomic shotgun sequencing provided complete sequences of cfxA, ermF, and ermX, whereas other resistance genes, such as tetQ and tetM, were detected as fragments. In addition, virulence factors from Streptococcus pneumoniae were the most common, with 13 genes detected. Our average nucleotide identity analysis also suggested five single-cell-isolated bacteria as potential novel species. These data would contribute to expanding the oral microbiome data resource.


Introduction
The oral microbiome is the collection of microorganisms that inhabit the oral cavity and are present in the human saliva.The oral microbiome is in uenced by various factors, including diet, oral hygiene, age, and health status [1].In general, a healthy oral microbiome is dominated by bacteria belonging to the phyla Firmicutes, Actinobacteria, and Proteobacteria, including Streptococcus, Rothia, and Neisseria [1,2].The oral microbiome plays an important role in maintaining oral and general health.Some bacteria in the oral microbiome produce antimicrobial compounds that prevent the growth of harmful bacteria in the oral cavity [3].Oral microbiome dysbiosis is associated with systemic health conditions, such as cardiovascular diseases, diabetes, and respiratory infections [1,2].
Microbiome analyses have provided additional information regarding the structure of the oral microbiota.However, many unculturable bacteria are present, and the individual bacteria or genes in the oral microbiota remain unidenti ed [2].Similar to the gut, the oral cavity may facilitate the accumulation of diverse bacteria and cross-species transfer of genes.Some novel capsular types of Streptococcus pneumoniae and drug resistance genes are derived from oral Streptococcus [4][5][6][7].In addition to crossspecies gene transfer, mobile genetic elements, such as plasmids containing antimicrobial resistance (AMR) genes, are transmissible between bacteria.Thus, droplet infection of saliva could be a global source of drug resistance.Amplicon and metagenomic shotgun sequencing have been used in micro ora analysis.Speci cally, 16S rRNA amplicon sequencing has been used to analyze the structure of the micro ora, but its accuracy is insu cient to estimate phylogeny at the species level [8,9].Metagenomic shotgun sequencing allows for the comprehensive analysis of gene structure.However, reconstructing reads in individual bacterial genomes via de novo assembly and binning is challenging [10].Additionally, in-depth analysis requires a large number of reads, which increases the costs of sequencing and computational resources.Moreover, not all reads obtained can be used to analyze the bacterial layer owing to contamination of the host DNA.
Bacterial single-cell sequencing has also been used for micro ora analysis [11,12].However, bacteria are approximately 1/10th the size and 1/1000th the amount of DNA in host cells [13][14][15].A previous study that used single-cell sequencing only achieved an average genome completeness of approximately 14% in 180 single bacterial cells [16].Using single-ampli ed genome (SAG) gel technology, Hosokawa et al. obtained an average genome completeness of 31.8% in 346 isolates [11].
Considering the lack of culture methods for identifying bacterial species in the oral micro ora, we aimed to use amplicon, metagenomic shotgun, and single-cell sequencing to identify bacterial species and investigate gene distribution in saliva samples.The bacterial genomes obtained showed a distribution of virulence factors and resistance genes with high accuracy and indicated the possibility of a novel genus.
2 Material and Methods

Sample preparation
After fasting for > 30 min in the morning, a healthy donor provided 5 mL of saliva.A 1 mL aliquot of the sample was added to OMNIgene ORAL solution (DNA Genotek Inc., Canada), which inactivates but stabilizes bacterial cells, and then stored at room temperature until single-cell sequencing.The presence of live bacteria was investigated as follows.In brief, 3 mL of saliva was centrifuged at 8000 rpm for 10 min, suspended in 800 µL of 50% glycerol/RPMI1640 solution, and then stored at -30°C until single-cell sequencing.The 16S rRNA sequencing, metagenome shotgun sequencing, single-cell isolation, genome ampli cation, and paired-end genome sequencing of both saliva samples were performed by bitBiome Inc. in Japan.

Sequencing-based pro ling
Sequencing-based pro ling was performed as previously described [29].Quality control and preprocessing of the FASTQ les from the next-generation sequencing were performed using fastp v.0.20.0 [30].To identify the bacterial species, we performed an average nucleotide identity (ANI) analysis of the assemblies by using Microbial Genomes Atlas MiGA online (http://microbial-genomes.org/) [31].
The AMR and virulence factor pro les were determined using ARIBA 2.14.4,with cleaned sequencing data [32].We used the Comprehensive Antibiotic Resistance Database (CARD) v.3.0.8 [33] and core and full datasets of the virulence factor database (VFDB) [34] as reference for ST, AMR, and virulence factor pro ling, respectively.The minimum percentage identities for the assemblies were set to 93 and 90 for the CARD and other databases, respectively.The analyzed data were visualized using Phandango [35].

Ethical statement
The study was conducted with written informed consent from the donor and approved by the Institutional Review Board of Osaka University Graduate School of Dentistry (R4-E4).

Comparison of 16S rRNA, metagenome shotgun, and bacterial single-cell sequencing on the human salivary microbiome
The results of the 16S rRNA sequencing, metagenome shotgun sequencing, 48-well single-cell isolation, and short-read genome sequencing of the inactivated and culturable samples are shown in Figure S1.Taxonomic bar plots at the genus level based on 16S rRNA analysis revealed that the inactivated and culturable samples had similar microbiome structures.In both samples, Streptococcus was the predominant genus, followed by Prevotera, Neisseria, and Veillonella (Figure 1A and Table S1).For bacterial single-cell analysis, genome sequences were obtained from 43 out of 48 wells for the OMNIgene-preserved samples and from 45 out of 48 wells for the glycerol stock samples.Genomic completeness greater than 80% was achieved in 17 wells for the OMNIgene-preserved samples and in 24 wells for the glycerol stock samples compared with known genomic sequences (Figure 1B and Table S2).
Similar to the 16S rRNA sequencing results, single-cell sequencing results showed that Streptococcus was the most abundant genus in the samples, followed by Prevotella.By contrast, the percentage of Neisseria, which was high in the 16S rRNA sequencing, was low, and Veillonella and Alloprevotella were not detected.In addition, 60 bacterial genera were detected using 16S rRNA sequencing, whereas only 17 genera were detected using single-cell sequencing (Figure 1A and 1C, and Table S2).
The total raw read counts for the metagenomic shotgun and single-cell analyses were 61,126,868 and 55,918,930, respectively (Tables S2 and S3).Metagenomic shotgun sequencing revealed an average contamination rate of 81.6%, indicating the di culty in separating bacterial DNA from human salivaderived specimens (Figure 1D and Table S3).By contrast, bacterial single-cell sequencing obtained a much lower average contamination rate of 10.4% per genome because of the single-cell separation process (Figure 1D and Table S2).Metagenome binning yielded nine bins from metagenome assemblies, of which eight were identi ed by metagenome shotgun sequencing and GTDBtk analysis, and 44 strains were identi ed at the species level by bacterial single-cell sequencing and GTDBtk analyses (Tables S2   and 1).
Although the host species of several of the detected pneumococcal virulence factors remained unclear, several pneumococcal virulence factors were harbored by oral streptococci.Neisseria ctrC was detected in a single-cell isolate, OSU002-0007, which was predicted to be Neisseria mucosa.Another single-cell isolate containing Neisseria lbpA was not identi ed in this species.Bacterial single-cell sequencing allowed us to elucidate the level at which bacteria have speci c genes, which is di cult to achieve with metagenomic shotgun sequencing.
In the MiGA ANI analysis, 10 of the 88 genomes met >95% of the criteria for species identi cation (Table 2).Although several genomes were predicted to be S. pneumoniae through GTDBtk taxonomy analysis (Table S2), MiGA ANI analysis showed that S. pneumoniae had the highest ANI value (91.4 %) in only one genome (OSU002-0038; Table 2).Despite the high genomic completeness of 99% determined by CheckM, the ANI value was less than 95%, indicating that no bacteria were virtually identi ed as S. pneumoniae in the samples.S. pneumoniae belongs to the mitis group of oral Streptococcus and cannot be distinguished from S. mitis or S. oralis through 16S rRNA sequencing, and some strains are di cult to identify even by biochemical tests [29,40,41].Therefore, care must be taken to distinguish bacteria from closely related species at the species level.ANI analysis also suggested ve single-cell-isolated bacteria as potential novel species (Table 2).Bacterial single-cell sequencing could be a powerful tool for searching for novel bacterial species in the microbiome.

Discussion
Saliva plays a multifaceted role in digestion, tooth remineralization, and oral cavity cleaning.However, it is also a vector of droplet infections.Daily activities such as talking, coughing, and sneezing produce large amounts of respiratory droplets that are subsequently deposited on dry surfaces [42].These droplets facilitate the transmission of bacteria such as S. pneumoniae [43].However, many microorganisms in the saliva are unidenti ed.Identifying the salivary micro ora is crucial to understand the mode of human-to-human transmission of microorganisms and/or AMR genes.S. pneumoniae can acquire genetic material from oral streptococcal species, resulting in drug resistance and serotype replacement [4][5][6][7].Jensen et al. comprehensively assessed the involvement of homologous recombination between oral streptococci and S. pneumoniae in acquiring β-lactam drug resistance.This study focused on the diversity at the DNA and amino acid levels of the transpeptidase region of pbp2x in 107 strains, pbp2b in 96 strains, and pbp1a in 88 strains of oral Streptococcus [4].The ndings revealed that polymorphic sites arising from spontaneous mutations in pbp accounted for 39% of all polymorphic sites observed in susceptible and resistant strains of S. mitis, S. oralis, and Streptococcus infantis.By contrast, extensive sequence variation was observed exclusively in resistant strains of S. pneumoniae.These results suggested that the previously diversi ed sequence in oral streptococci was imported by S. pneumoniae possibly because of the selective pressure exerted by antimicrobial agents.
In 2020, Ganaie et al. reported the discovery of the 100th pneumococcal capsular serotype, designated as 10D [7].This study revealed that the capsular synthesis genes of serotype 10D exhibited three large regions of homology with genes arranged in the same order (syntenic) as those found in serotypes 6C and 39 and the capsular synthesis genes of S. mitis SK145.Notably, the syntenic region of 10D with SK145 spanned approximately 6000 bp and included a short fragment of wciNα at the 5 end.The presence of this nonfunctional wciNα fragment provided compelling evidence of interspecies gene transfer from oral streptococci to S. pneumoniae.Moreover, the sequence of wcrO 10D , a capsular synthesis gene cluster of serotype 10D, displayed low homology (40%-50% amino acid identity) with the wcrO genes of serotypes 33C, 34, 35F, and 36 despite the sequencing of the capsular synthesis gene cluster in over 20,000 pneumococcal strains.By contrast, wcrO 10D exhibited surprisingly high homology (94% amino acid identity) with RS00925 from S. mitis SK145.These ndings suggest that the 100th capsular serotype 10D arose from the acquisition of the S. mitis gene by S. pneumoniae.Resistance and diversi cation of capsular types of S. pneumoniae pose signi cant threats to human health, and understanding the sources of this diversity is crucial.These results indicate that oral streptococci serve as an external genetic pool for pneumococci.Thus, elucidating the genetic diversity of the oral ora is important.
In the present study, saliva samples from the same individuals were classi ed as inactivated or culturable and then analyzed using 16S rRNA sequencing, metagenomic shotgun sequencing, and bacterial single-cell sequencing.Both inactivated and culturable samples were suitable for analysis, but the inactivated samples were preferred when dealing with samples that may pose a risk to analysts, such as those from COVID-19-infected individuals.Genomic sequencing facilitated the exploration of the metabolic systems of unculturable bacteria present in the samples, potentially allowing the cultivation of previously unculturable bacteria from culturable samples.
Traditionally, metagenomic shotgun sequencing has been used to study gene functions in the bacterial ora.However, metagenome shotgun analysis involves binning to identify bacterial species and then constructing a metagenome-assembled genome to search for gene distribution [44].Although the binning technology has improved and contributed to the identi cation of many bacteria, the high homology of essential genes among species makes complete classi cation extremely di cult.In other words, identifying which bacteria possess the genes detected by metagenome shotgun sequencing is a future challenge and is expected to be improved by short-and long-read hybrid analysis, long-read deep sequencing combined with Hi-C-seq, and other techniques [45,46].The single-cell analysis performed in this study proved to be highly effective, allowing the precise identi cation of genes present in individual bacteria.In addition, the use of preisolated bacteria eliminated the need for binning, which poses challenges in terms of improving accuracy.Sequencing DNA in a single bacterial cell can reveal the host of mobile genetic elements, such as plasmids and phages, whereas Hi-C-seq provides a community composition pro le.Elucidating the dynamics of gene transfer between bacterial species at high frequencies is essential to understand the spread of resistance genes through saliva and develop effective control strategies.Future studies could combine multiple methods to elucidate the human microbiota structure and identify previously unrecognized bacterial species and genetic characteristics.

Figure 1 See
Figure 1

Figure 2 See
Figure 2

Figure 3 See
Figure 3